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Abstract. This paper focuses on the development of a two-level preconditioner based on a fully algebraical en- 
hancement of a Schwarz domain decomposition method. We consider the purely divergence of a Restricted Additive 
Scwharz iterative process leading to the preconditioner developped by X.-C. Cai and M. Sarkis in SIAM Journal of 
Scientific Computing, Vol. 21 no. 2, 1999. The convergence of vectorial sequence of traces of this process on the 
artificial interface can be accelerated by an Aitken acceleration technique as proposed in the work of M. Gai'bey and 
D. Tromeur-Dervout, in International Journal for Numerical Methods in Fluids, Vol. 40, no. 12,2002. We propose 
a formulation of the Aitken-Schwarz technique as a preconditioning technique called Aitken-RAsQ The Aitken ac- 
celeration is perf oiTned in a reduced space to save computing or peiTnit fully algebraic computation of the accelerated 
solution without knowledge of the underlying equations. A convergence study of the Aitken-RAS preconditioner is 
proposed also application on industrial problem. 
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1. Introduction. The convergence rate of a Krylov method such as GCR and GMRES, 
developed by Einsenstat & al ifTSl . to solve a linear system Au ^ f, A= (a.y) e M'"^™,ii G 
R™, b e K™, depends on the matrix eigenvalues distribution. They derived the convergence 
rate of the GCR and GMRES methods as for the GCR: 

\\n\\2 < [1 - X (M^r^fZ-^ (j,,. f'\\ro\\2 < [1 - -^,]nro\\2 (1.1) 

whereM = (yl+A*)/2 and i? = {A-A^)/2. The GMRES method is mathematically equiv- 
alent to the ORTHORES algorithm developed by Young and Cea ll40l . Its convergence rate 
follows the same formula as (II. lb if A is positive real. Otherwise when A is diagonalisable 
A = XAX~^ , the convergence of the GMRES method depends on the distribution of the 
eigenvalues and the condition number k{X) as follows. Let Ai, . . . , be the eigenvalues of 
A with a non-positive real part, and let A^i, . . . , A„ those with a positive real part belonging 
to the circle centred in C > with a radius R with C > R. Then the GMRES convergence 
rate can be written as: 



\ni\\2 < ii{x) 
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\\ro\\2 (1.2) 



with D = maxi=i,^;j=^i_„ |Ai — Aj| and d = mini=i_p |A,|. (ll.ll l and ( 11.21) . and, gener- 
ally speaking, decreases when the condition number K2{A) = ||A||2||A^^||2 of the non- 
singular matrix A increases. This implies the need to reduce the scattering of the eigenval- 
ues distribution in the complex plane in order to improve the convergence rate. This is the 
goal of a preconditioning technique. The left-preconditioning techniques consist to solve 
M^^Au = M^^f such that K2{M^^A) << K2{A). In this work we focus on the Schwarz 



This paper extends the proposition of the ARAS preconditioning technique published in T. Dufaud and D. 
Tromeur-Dervout, Aitken's acceleration of the Resctricted Additive Schwai'z preconditioning using coarse approxi- 
mations on the interface. C. R. Math. Acad. Sci. Paris, Vol. 348. no. 13-14, pages 821-824, 2010, by developing 
the building of the preconditioner and its theoretical properties. Moreover, it focuses on a fully algebraic technique 
based on SVD to approximate the solutions. Finally, results on industrial hnear systems are provided. 
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preconditioning techniques and the preconditioning techniques that are related to the Schur 
complement of the matrix A. 

Let us first recall some state of the art about the Schwarz and Aitken-Schwarz solvers, 
and preconditioners based on the Restricted Additive Schwarz (RAS). 

1.1. State of the art of Schwarz and Aitken techniques. First, lets us consider the 
Generalized Schwarz Alternating Method introduced by Engquist and Zao lfT9l that gathers 
several Schwarz techniques (see Quarteroni and Valli [35 |). For sake of simplicity, let us 
consider the case where the whole domain il is decomposed into two sub-domains 57i and 
il2, with overlapping or not, defining two artificial boundaries Fi, Let flu ~ ili\il2, 
^22 — ^2\^i if there is an overlap. Let L{x) be the continuous operator associated with the 
discrete operator A. It can be written in its multiplicative version as: 



Algorithm 1 GSAM: Multiplicative version 



1: DO until convergence 
2: Solve 



L(x)uf+i(a;)-/(.T), Vxel^i, (1.3) 

ul''+\x) ^ g{x), e dniXTi, (1.4) 

A,„2„+i ^ x^ dul^^'i^) ^ ^ ^^^^pxl^ ^ ^^ ^^ 

oni oni 



3: Solve 



= f{x), Vx e ^2, (1.6) 

uf'+^ix) = g{x), \fx e dn2\r2, (1.7) 

A 2n+2 I x {x) 2n+l . \ ^'^l ^ (^) vv ^ "P /-l C\ 

A2M2 + A2 t: =A2Wi + A2 , Va::er2. (L8) 

0712 on2 



4: Enddo 



where A^ are some operators and Ai are constants. 

According to the specific choice of the operators A^ and the values of scalars A;, we 
obtain the family of Schwarz domain decomposition techniques: 
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Neumann-Dirichlet (Marini & Quarteronni [33 1) 
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Modified Schwarz (Lions Bli) 



Table 1.1 

Derived methods obtained from the specific choices of the operators Ai and the vahies of scalars Xi in the GSAM. 



If Al = A2 = / and Ai = A2 = then the above multiplicative version is the classical 
Multiplicative Schwarz. If Ai = A2 = constant and Ai = A2 = 1 then it is the modified 
Schwarz proposed by Lions in ll32l . 

Engquist and Zao |fT9l showed that with an appropriate choice of the operators A^ this 
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domain decomposition method converges in two iterations. They established the proposition 
that follows: 

Proposition 1.1. If h-i (or K2} is the Dirichlet to Neumann operator at the artificial 
boundary Ti (or T2) for the corresponding homogeneous PDE in D,2 (or fli) with homo- 
geneous boundary condition on dCl2 H dQ (or dQi H dCl) then the Generalized Schwarz 
Alternating method converges in two steps. 

The GSAM method converges in two steps if the Dirichlet-Neumann operators A^, i = 
1, 2, are available. These operators are not local to a sub-domain but they link up together all 
the sub-domains. In practice, some approximations defined algebraically of these operators 
ai-e used (see Chevalier & Nataf [111, Gander & al [23], Gerard o-Giorda & Nataf ll27l ). 

In the Aitken-Schwarz methodology introduced by Garbey & Tromeur-Dervout Il25ll26l . 
only the convergence property of the Schwarz method is used. Consequently, no direct ap- 
proximation of the Dirichlet-Neumann map is used, but an approximation of the operator of 
error linked to this Dirichlet-Neumann map is performed. This Aitken-Schwarz methodology 
is based on the purely linear convergence for the Schwarz Alternating method when the local 
operators are linear operators. 

Definition 1.2. Let (^(u'll^^_^ ^ ~ w*^^ be a vectorial sequence converging to- 
ward n ~ ^ purely linearly if 

u^+^ -^^p{u^ -e) (1.9) 

where P G M"^" is a constant error's transfer operator independent of k and non-singular 
We assume that there exists a norm 1 1.| | such as | |P| | < 1. Then P and ^ can be determined 
from n + 1 iterations , using the equations: 

(u'^'+i - u\ ...y -u')=P - u''-\ - u°) (1.10) 

So, if (u" — u"^^, — u") is non-singular P can be written as : 

F= (^u" -u"'\...,u^ -u°y^ (1.11) 

Then if | |P| | < 1, (/ — P) is non singular and it is possible to compute ^ as 

e = (/ - P)"^ - Pu") (1.12) 

For domain decomposition methods, the vectorial sequences u" corresponds to the iterated 
solution at the subdomains artificial interfaces. To apply directly the Aitken's acceleration 
in the vectorial case, we have to construct the matrix P or an approximation of it, and to 
apply the Aitken's acceleration (11.12b . Algorithm|2]describes the acceleration written in the 
canonical base of K" ("physical space"). 



Algorithm 2 Vectorial Aitken's acceleration in the physical space 
Require: Q : M" —5- M" an iterative method having a pure linear convergence 
Require: (M''")i<fe<n+i, n+1 successive iterates of Q starting from an arbitrary initial 
guess M° 

1: Form E'' = it'^+i -u'', <k<n 
2: if ...,E"]is invertible then 

3: p =[£:",..., [£:"-!,..., 

4: U°° ^ [In - P)-l(?i"+l - Pm") 

5: end if 
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The drawback of Algorithm|2]is to be limited to a sequence of small vector size because 
it needs a number of iterations related to the vector size n. In order to overcome this difficulty, 
some approximation of the error transfer operator P is proposed using some coarse approx- 
imation spaces to represent the solution. Garbey [24] proposed to write the solution in the 
eigenbasis associated to the part of the separable operator associated to the direction parallel 
to the artificial interfaces, or with a coarse approximation of the sinus or cosinus expansion 
of the solution at the interface. Tromeur-Dervout [39] proposed to build an approximation 
space based on the Singular Value Decomposition (SVD) of the interface solutions of the 
Schwarz iterates. This last approximation override any constraints about separability of the 
linear operator A and mesh considerations. 

Nevertheless this last techniques fail when: 

• the iterative process based on domain decomposition diverges too fast, 

• the local solutions are inaccurately solved leading to a less numerically efficient 
acceleration by the Aitken's process. 

In such cases the Aitken-Schwarz method as solver is no longer suitable should be con- 
sidered as preconditioner of a Krylov method. The purpose of this paper is to detail and to 
extend the preconditioners based on Schwarz domain decomposition accelerated by Aitken's 
techniques developped by Dufaud & Tromeur-Dervout [T6l with building and approximation 
of the matrix P arising from the SVD approximation of the Schwarz interface solutions. 

1.2. State of the art of preconditioners based on RAS. The techniques of precondi- 
tioning that are based on domain decomposition of Schwarz's type have been widely devel- 
oped this last decade and accelerated multiplicative Schwarz has been "a consistently good 
performer" as said Cai & al [7J. The first type of domain decomposition preconditioning to 
appear was domain decomposition based on substructuring technique of Bramble & al lH] 
followed by the Additive Schwarz (AS) preconditioning of Dryja & Widlund fT4l, Gropp & 
Keyes 1281 . It is built from the adjacency graph G = {W, E) of A, where W ~ {1, 2, to} 
and E = : ^ 0} are the edges and vertices of G. Starting with a non-overlapping 

partition W — Uf^j^W^.o and S > given, the overlapping partition {W^^a} is obtained 
defining p partitions Wi^s ^ Wi^s-i by including all the immediate neighbouring vertices of 
the vertices in the partition Wi,s-i- Then the restriction operator Ri s : W — > Wi^s defines 
the local operator 5 = Ri^sARfg,Ai^s e R^-.^x'"'.-' on Wij. The AS preconditioning 
p 

writes: M^ls = Y,RlsAllR^,s■ 

i=l 

Cai & Sarkis IS) introduced the restriction matrix Ri^s on a non-overlapping sub-domain 
Wifi, and then derived the Restricted Additive Schwarz (RAS) iterative process as: 

p 

- u"^-^ + M^\s , (/ - Au^-') , with Af^i^ , = RlsKlR^,s (1.13) 

1=1 

They showed experimentally that the RAS exhibits a faster convergence than the AS, as Ef- 
stathiou & Gander demonstrated in ifTTI for the Poisson problem, leading to a better precon- 
ditioning that depends on the number of sub-domains. We note, that Cai & al |6| develop 
extensions of RAS for symmetric positive definite problems using the so-called harmonic 
overlaps (RASHO). 

When it is applied to linear problems, the RAS has a pure linear rate of convergence / 
divergence. When it converges, its convergence can be enhanced with optimized boundary 
conditions giving the ORAS method of St Cyr & al Il36l . In this case, the transmission condi- 
tion in GSAM takes the form to be the normal derivative ( Neumann boundary condition). 
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Then, an optimisation problem is done to minimize the ampUfication factor of the Schwarz 
method with this Robin coefficient in the Fourier space. The drawback of this method is 
that it can only be applied to separable operators, and need regular step size and periodic 
boundary conditions in the direction orthogonal to the interface to be mathematically valid. 
Nevertheless, if it is not the case, the parameters in the Robin conditions are set based on this 
postulate and applied in the current case. 

This Neumann-Dirichlet map is related to the Schur complement of the discrete operator 
(see for example Natarajan Il34l or Steinbach |37|). 

Saad & Li f3U\ introduced the SchurRAS method based on the ILU factorisation of the 
local operators Aij present in the RAS method. Magoules & al 1 22 1 introduced the patch sub- 
structuring methods and demonstrated its equivalence with the overlapping Schwarz methods. 
In this work the Dirichlet and Neumann boundary conditions present in the Schwarz alter- 
nated method have been replaced by Robin boundary conditions to enhance the convergence 
rate. The patch method consists in introducing an overlap in the Schur complement technique. 
These two techniques take care of the data locality in order to avoid global communications 
involving all sub-domains. Nevertheless, as in the RAS preconditioning technique, the main 
drawback of this locality is a decreasing of the preconditioning efficiency with respect to the 
number of sub-domains. 

Another related work that takes care to involve all the sub-domains present in the Schur 
complement is the substructuring method with a suitable preconditioner for the reduced equa- 
tion of Bramble & al 1 5 1 , Carvalho & al 1 9 1 1 1 1 , Khoromskij & Wittum 1 29 1 . Let us describe 
the iterative substructuring method and the preconditioning by the additive Schwarz precon- 
ditioner for the Schur complement reduced equation on the interface problem designed by 
|^ |. Let r be the set of all the indices of the mesh points which belong to the interfaces 
between the sub-domains. Grouping together the unknowns associated to points of the mesh 
corresponding to T into the vector Mr and the ones corresponding to the other unknowns (cor- 
responding to the points of mesh associated to the interior I of sub-domains) into the vector 
uj, we get the reordered problem: 



All AiA (uA ^ ffi 
An Ar J Ur/ \fr 



(1.14) 



Eliminating the unknowns uj from the second block row of (I1.14t leads to the following 
reduced equation for ur: 

Sur^fr-AriAjj'fi, (1.15) 

where 

S = Arr-AriAjj'Air (1.16) 

is the Schur complement of the matrix Ajj in A. Let be Ti = dfliXdfl. Let Rr^ : F be 
the canonical pointwise restriction which maps vectors on F into defined vectors on F^, and 
let be -Rp : F^ F its transposed. The Schur complement matrix (11.16) can also be written 
as: 

p 

S ^^R^^S'-'^Rr, (1.17) 

i=l 

where 

^« = 4;'r. - ^r.. A^M,r. (1.18) 
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is referred to the local Schur complement associated with the sub-domain fli. S*''^ that in- 
volves the submatrices from the local matrix A'^^'> which is defined as 



Then they defined a BPS (Bramble, Pasciak & Schatz (5)) preconditioner which is based 
on the set V which gathers the cross points between sub-domains (i.e points that belong to 
more than two sub-domains) and the sets Ei of interface points (without the cross points in 
V) 

E, = {dQj n dQi) - V (1.20) 

rn 

T={[jE,)UV (1.21) 

The operator Ri defined the standard pointwise restriction of nodal values on Ei while op- 
erator Rv defined the canonical restriction on V. Then a coarse mesh is associated with the 
sub-domains and an interpolation operator R^ is defined. This operator corresponds to the 
linear interpolation between two adjacent cross points Vj Vi in order to define values on the 
edge Ei . This allows to define Ah the Galerkin cross grid operator Ah = RAR^ . They 
deduced a very close variant of BPS preconditioner that can be written as: 

Mbps = RJSuR^ + R^A'^^R (1.22) 

Ei 

They defined a coarse-space operator 

Ko^RoSRl (1.23) 

where Rq : U — > J7o is a restriction operator which maps full vector of U into vector in Uq 
where Uq is a q-dimensional subspace of U the algebraical space of nodal vectors where the 
Schur complement matrix is defined. 

Mbps = J2 ^f^"^. + R^K'Ro (1-24) 

Ei 

where Sa is an approximation of Sa. The definition of Uq gives different preconditioners: 
Vertex-based coarse space, sub-domain-based coarse space, edge-based coarse space, de- 
pending on the set of points of the interface F that are involved. From the implementation 
practical point of view, the coarse matrix Aq is constructed once and involve matrix vector 
products of the local Schur complement only. 

1. The advantages of this method, is to defined the two-level preconditioner only on 
the interface F. It is intimately related to the Schur complement operator defined on 
the interface. 

2. The drawback is to have to define a priori the coarse space Uq without any knowl- 
edge of the solution behavior Consequently it can be expensive in term of number 
of coarse space vectors, specifically for 3D problems where cross-points between 
sub-domains in 2D, become cross-regions between sub-domains in 3D . 

Our approach will follow the same spirit as this two-level preconditioning working only 
on the interface. But we still work on the system Ax = b and not Sur ~ gr and we use an a 
posteriori knowledge of the global Dirichlet to Neumann map that is based on the pure linear 
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convergence/divergence of the RAS to define the coarse space (equivalent of the definition of 

Uo). 

The plan of this paper is the following. Section |2] will derive the Aitken-Schwarz precon- 
ditioning. Section [3] will focus on coarse approximation of the solution at the artificial in- 
terfaces, notably with a random set of orthogonal vectors and an orthogonal set of vectors 
obtained through the SVD of the Schwarz interface solutions. Then, section |4] proposes a 
study of convergence of the ARAS class preconditioners. Eventually numerical tests are pro- 
vided on academic problems in section|5]and industrial problems in section|6l 

2. Aitken-Schwarz method derived as preconditioning technique. In this section we 
study the integration of the Aitken's acceleration into a Richardson process in order to formu- 
late a preconditioning technique based on Aitken. More precisely, we propose to enhance the 
RAS preconditioning technique, presented in section[T] by the Aitken's acceleration. We first 
present the mechanism of the method and develop the equation to extract a corresponding 
Richardson's process. Then we point out that the method in its simple form does not exhibit 
the complete acceleration after one application and need an update as when the method is 
used as solver The result is a multiplicative preconditioner based on the Aitken RAS precon- 
ditioner Finally we present those preconditioners in their approximated form in order to save 
computing. 

2.1. The Aitken Restricted Additive Schwarz preconditioner: ARAS. Let Tt = 

{Irtii s — R'[s)Wi^s be the interface associated to Wij and F = U^^j^F^ be the global in- 
terface. Then U|r G M" is the restriction of the solution u G K™ on the F interface and 
ej^P — itj^p — u'^ is the error of an iteration of a RAS iterative process, equation ( 12.11 ) at the 
interface F. 

u'^ = u"-' + M^\, ,{f - Au'^-^) (2.1) 

In section [T] we wrote that the Schwarz iterative method has a pure linear convergence. 
This property enables us to use the Aitken's technique presented the same section. The pre- 
viously mentioned Q iterative process is replaced by a RAS iterative process. 

Using the linear convergence property of the RAS method, we would like to write a 
preconditioner which includes the Aitken's acceleration process. We introduce a restriction 
operator i?r G M"^™ from W to the global artificial interface F, with i?ri?f = In- The 
Aitken Restricted Additive Schwarz (ARAS) must generate a sequence of solution on the 
interface F, and accelerate the convergence of the Schwarz process from this original se- 
quence. Then the accelerated solution on the interface replaces the last one. This could be 
written combining an AS or RAS process eg. (12. 2a) ) with the Aitken process written in M™'^'" 
eq.( l2.2bb and subtracting the Schwarz solution which is not extrapolated on F eq.( l2.2cl i. We 
can write the following approximation u* of the solution u: 

u* - u^-i + Mj,\sAf - ^"'■"') (2.2a) 
+ Rf {In - P)'^ (lifr - Pu^f"^) (2.2b) 
- R^InRr + M-\g ,{f - Au'^-i)) (2.2c) 

We would like to write u* as an iterated solution derived from an iterative process of the 
form u* — u^^^ + M^^^^ s {f ^ Au*^^^), where M^^^g ^ is the Aitken-RAS precondi- 
tioner 
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First of all, we write an expression of eq. (l2.2bt depending on eg. (12. lb and which only 
involves the iterated solution follows: 

= i?? (/„ - Py' Rr {R^InRru'' - rIpRtu''-^) 
i with eq.dmi 

= Rl {In - Py' Rr (RtIuRt [u''-^ + M^\s , (/ - Au^-^)) 

-R^PRtu^-^) 
= Rl [In - Py' RTRrlnRr (u'''^ + M^\s,5 (/ - Au""^^)) 

(/„ - Py'^ RtRIpRvu^'^ 
= R^ (/„ - Py' Rr (u''-' + M«As., (/ - Au^-')) 

{in~py^ prtu''-^ 

Then, we re-write eg. (12. 2b with this new expression of eq. (l2.2bb as follows: 

(/„ - Py^ Rr [u'^-' + Mj,'^s,s (/ - Au'^-')) 
(/„ - Py' PRru''-' - RpnRr (^^'"' + Mj,\gsif - Au'-')') 
i factorizing by (u''-^ + M^\gj^ (/ - Au^-'^)^ 

= (/,„ - R^InRv + Rr (In - Py^ Rr) + Af-^^^ (/ - Au''-')) 

{In - Py'^ PRru''-^ 
I isolating u''^^ from Mj^\g s {f - Au''~'^) 

= u''-^ + (^-R^InRr + R? {In - Py' Rr - Rr {In - P)"' PRr) u^~^ 

+ (in, - R^InRr + i?F {In - Py' Rr) M^Xss (/ - Au''-') 

One can simplify E = (^i?^ (/„ - Py'^ Rr - i?f (/„ - Py^ PRr) as follows: 

E^R^ {In - Py' Rr (R^InRr - R]^PRr) 
= i?f (/„ - Py' i?ri?r (In - P) Rr 
= R^ {In - Py\ln - P) Rr 
= Rr^nRr 

And then writes, 

u* = u''-^ + {-R'^InRr + RrlnRr) u^~'^ 

+ (/„ - R^InRr + Rr {In - Py^ Rr) M^j^^ (/ - Au''-') 

= u''-' + [l,n - RrlnRr + Rr (In - Py^ Rr) M«As,5 (/ - Au''-') 

= u""-' + (in, + Rl ({In - Py' - In) Rr) Mj^^g g (/ - Au''-^) 

8 



Hence the formulation eq. (l2.2l l leads to an expression of an iterated solution u*: 

K (/,„ + i?F ((/„ - Py^ - In) Rr) M^\s, (/ - Au^-^) 



u 



This iterated solution u* can be seen as an accelerated solution of the RAS iterative pro- 
cess. Drawing our inspiration from the Stephensen's method |38 |, we build a new sequence 
of iterates from the solutions accelerated by the Aitken's acceleration method. Then, one 
considers u* as a new and writes the following ARAS iterative process: 



-1 + (/™ + Rl ((/„ - P)-^ - In) Rr) Mflls,^ (/ - Au^-i) (2.3) 
Then we defined the ARAS preconditioner as 



^ARAS.S 



P 

Irn + Rr {{In - P)'' - In) Rf) J2 ^lsK]R^,S (2.4) 



Remark 1. The ARAS preconditioner can be considered as a two-level additive precon- 
ditioner The preconditioner consists in computing a solution on an entire domain applying 
the RAS preconditioner and add components computed only on the interface T. 

2.2. Composite Multiplicative form of ARAS: ARAS2. If P is known exactly, the 
ARAS process written in the equation (12.3) needs two steps to converge to the solution u 
with an initial guess = 0. Then we have: 

Proposition 2.1. If P is known exactly then we have 



-1 

AFIAS,5 



that leads (^I ~ Af^jj^g ^'^ nilpotent matrix of degree 2. 

Proof We consider the postulate: "If P is known exactly, the ARAS process written in eq. ( 12.31 1 
needs two steps to converge to the solution with an initial guess vP — 0". 

We write the two first iterations of the ARAS process for any initial guess vP £ M™ and 
for all f e M™.- 



y2 ^ ^1 , ^,r-l 



And the second iterations leads to: 

Maras.s (/ - Au') 
+ ^IAas^ (/ - ^«°) + M2ras,8 {f-A (n" + (/ - An'))) 

Let vP — 0, then, 

= MIAa5, J + M^RAS., [f-A (M^],j,s,sf) 
— u 

Since this expression is true for all f G we can write: 

/I — '^"■-^ARAS,S ^^-^ARAS,5^^^-^ARAS,S 
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Now we can write: 

= ^ARAS.sf + (^m ^ ^'^ARAS,S^^ ^ARAS.sf 
J, w/f/i Aw = / 

= ^''^ARAS.S^^^ + (^'" ^ ^^aAaS.iS"^) ^'^aAaS,*'^"" 

77iMs, 

Which is equivalent to 

= (/„ - M^j^^g gA^ \,yue 

Hence ^/„j — M^j^^g gA^ is nilpotent of degree 2. [] 

The previous proposition leads to an approximation of A^^ written from the 2 first iter- 
ations of the ARAS iterative process (I2.3l l. Those 2 iterations compute the Schwarz solutions 
sequence on the interface needed in order to accelerate the Schwarz method by the Aitken's 
acceleration. We now write 2 iterations of the ARAS iterative process (12. 3t for any initial 
guess and for all u^~^ G K™. 

+ MaRASJ (/ - ^ + M^RAS,5 if ~ A^'-'))) 

+ ^^ARAS,5f ~ ^'^ARAS.S^'^'' ^ ~ ARAS, S^-^ ARAS, S (/ 

= + 2A^IAa5,. (/ - ^"'■"') - M^RAS,sAM-,'^^s,s (/ 
= + [2MX'n^s,s - Maras,sAM^ras,5) (/ " Au^^') 

Then we defined the ARAS2 preconditioner as 

^'^ARAS2,S — '^^'^ARAS,S ~ ARAS, S^^'^ ARAS, S (^'^^ 

Remark 2. According to the linear algebra literature about preconditioning technique 
j^, the ARAS2 preconditioner can be considered as a composite multilevel preconditioner. 
Actually, theARASl preconditioner is a multiplicative form of ARAS which is itself an additive 
preconditioner adding an operation on the entire domain with RAS and an operation on a 
coarse interface with the Aitken formula. 

10 



-Au''-^) 



2.3. Approximated form of ARAS and ARAS2. As the previous subsection suggests, 
since P is known exactly there is no need to use ARAS as a preconditioning technique. 
Nevertheless, when P is approximated, the Aitken's acceleration of the convergence depends 
on the local domain solving accuracy, and the cost of the building of an exact P depends 
on the size n. This is why P is numerically approximated by Pu as in ||39l . defining q < 
n orthogonal vectors Ui G M", defining the columns of the matrix Ug G i?"^"?. Then 
it makes sense to use Pu, in the ARAS preconditioning technique to define the ARAS(q) 
preconditioner: 

p 

MlHASi,).s = {im + i?FU, ((/, - Pv,y' - I,) V^Rr) J2 RhKlR^^s (2.6) 

i=l 

and ARAS2(q), 

^^ARAS2(q),5 ^ '^^^ARAS(q),5 ^ ^'^ARAS{q),S^^'^ARAS(q),S ^^'"^^ 

Different kind of approximation techniques of the error transfer operator matrix P have 
been proposed in the work of Il24l l26l |3] |2] [21] ■ Nevertheless, we are interested in alge- 
braic ways to compute an Aitken acceleration. In section |3] we proposed two fully algebraic 
approaches. 

3. Basis to approximate the interface solution. We focus here on an algebraic way 
to compute an Aitken acceleration of a sequence of Schwarz solutions on the interface. The 
global approach consists on an explicit building of P computing how the spanning vectors 
Ui are modified by the Schwarz iteration. Figure ISTI illustrate s the steps for constructing the 
matrix P. Step (a) starts from the spanning vector on the interface i?p C/^ and gets its value 
on the interface in the physical space. Then step (b) performs a complete Schwarz iteration 
with zero local right hand sides and homogeneous boundary conditions on the others artificial 
interfaces. Step (c) decomposes the trace solution on the interface in the spanning vector set 
Vq. Thus, we obtain the column i of the matrix P. 




(a) [b] (c) 



Figure 3.1. Steps to build the P matrix 

The full computation of P can be done in parallel, but it needs as much local domain 
solution as the number of interface points (i.e the size of the matrix P). 

Its adaptive computation is required to save computing. This methodology was first used 
with Fourier basis functions IISTl [T5l . This section focuses on the definition of orthogonal 
"base" Uq that will extend this adaptive computation in a general context. In the following, 
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we use the term "base" to denote a spanning vectors set that defines the approximation space. 
The key point of these preconditioners's efficiency is the choice of this orthogonal "base" Vq. 
It must be sufficiently rich to numerically represent the solution at the interface, but it has to 
be not too large for the computation's efficiency. 

We first propose a "naive" approach consisting of selecting an arbitrary set of orthog- 
onalized random vectors to generate the space to approximate the solution. Secondly, we 
represent the solution in a space arising from the singular value decomposition of a sequence 
of Schwarz solutions. Doing this, we assume to represent the main modes of the solutions. 

3.1. Orthogonal "base" arising from an arbitrary coarse algebraic approximation 
of the interface.. A choice consists in having a coarse representation of the interface's solu- 
tion u G M" from an algebraical point of view. Nevertheless, it is not possible to take a subset 
of q vectors of the canonical base of R", as if some components of u are not reachable by the 
"base" Ug, then the approximation \ \u — lLJg(U^u)|| will be very bad. This reason leads us 
to define as a set of orthogonal vectors where each component is coming from a random 
process in order that each vector can contribute to a part of the searched solution at the inter- 



face. We split q such as g = > qi and we associate qi random vectors to the interface Ti, 



1 < i < p. Then these qi vectors are orthogonalized to form q^ columns of the "base" Vq. 

This strategy is hazardous but can be a simple way to improve the convergence of a 
Schwarz process without knowledge of the problem and the mesh. 

The orthogonal "base" is obtained applying the same principle as illustrated in Figure 
13.11 leading to Algorithm[3] 



Algorithm 3 Vectorial Aitken's acceleration in an arbitrary built space without inversion 
Require: G ■ R" —> K" an iterative method having a pure linear convergence 

1: Compute q random vectors Vi £ M" following a uniform law on [0, 1] 

2: Orthogonalize those q vectors to form e M"^'^ 

3: Apply one iterate of Q on homogeneous problem, — > = 0{^q) 

4: Set P = VqW 

5: i = {Iq - P)-^ {u^ - P u°) {Aitken Formula} 

6: e = 



The lack of this method is that there is no possibility to control the quality of the base 
to perform the acceleration. A more controllable method will be preferred. In the following 
subsection, we propose a different starting point to build the base. The main idea will be in 
the fact that we can compress the vectorial sequence using a Singular Value Decomposition. 
Since the Vq matrix is built, P is built the same way. 

3.2. Approximation compressing the vectorial sequence. A totally algebraic method 
based on the Singular Value Decomposition of the Schwarz solutions on the interface has been 
proposed when the modes of the error could be strongly coupled |39|. This method offers the 
possibility for the Aitken Schwarz method to be used on a large class of problem without 
mesh consideration. Moreover, when computing an Aitken acceleration, the main difficulty 
is to invert the matrix [E"-~^, . . . , i?"] which can be close to singular In a computation, most 
of the time is consumed solving some noise that does not actually contribute to the solution. 
The singular value decomposition offer a tools to concentrate the effort only on the main parts 
of the solution. 



p 
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3.2.1. The singular value decomposition. A singular-value decomposition (SVD) of a 
real n x m {n > m) matrix A is its factorization into the product of three matrices: 

A = UI]V*, (3.1) 

where U = [Ui, . . . , Um] is an x m matrix with orthonormal columns, E is a n x to non- 
negative diagonal matrix with ~ o^i, I < i < m and the raxra matrix V = [Vi, . . . , Kn] 
is orthogonal. The left U and right V singular vectors are the eigenvectors of AA* and A* A 
respectively. It readily follows that Avi — aiUi, 1 < i < m 

We are going to recall some properties of the SVD. Assume that the ai,l < i < m are 
ordered in decreasing order and there exists r such that ctj. > while (Tr + 1 = 0. Then A 
can be decomposed in a dyadic decomposition: 

A = CriUiV* + (J2U2V* + ...+ CTrUrV;. (3.2) 

This means that SVD provides a way to find optimal lower dimensional approximations of 
a given series of data. More precisely, it produces an orthonormal base for representing the 
data series in a certain least squares optimal sense. This can be summarized by the theorem 
of Schmidt-Eckart-Young-Mirsky: 

Theorem 3. 1 . A non unique minimizer X^, of the problem toitlx ,rankx=k \\A-X\\2 = 
iJk+i{A), provided that ak > Uk+i, is obtained by truncating the dyadic decomposition of 
(l321l to contain the first k terms: = aiUiV{ + (T2C/2V^2* + • • • + f^kUkV^ The SVD 
of a matrix is well conditioned with respect to perturbations of its entries. Consider the 
matrix A, i? G R", the Fan inequalities write (Tr+s+i{A + B) < ar+i{A) + as+i{B) with 
r, s > 0, r + s + 1 < n. Considering the perturbation matrix E such that | | = 0(e), then 
\(Ti{A + E) — o-i{A)\ < (Ti{E) = ||i?||2, Vi. This property does not hold for eigenvalues 
decomposition where small perturbations in the matrix entries can cause a large change in the 
eigenvalues. 

This property allows us to search the acceleration of the convergence of the sequence of 
vectors in the base linked to its SVD. 

Proposition 3.2. Let {u^)i<k<q q successive iterates satisfying the pure linear con- 
vergence property: — u°° = P{u'^~^ — u°°). Then there exists an orthogonal base 

= [f7\ f/^, . . . , [/«] of a subset o/R" such that 

• af — (TiV^i with (cr;);gN decreasing and |V^*J < 1 => V/ G {1, q}, 

• u'' = ELi(^fu',yke{i,...,q} 

One can write: 

where P U*PUq. Moreover (a^": ■ ■ ■ ^ o^^)"^ obtained by the acceleration process rep- 
resents the projection of the limit of the sequence of vectors in the space generated by Ug. 

Proof By theorem 3.1.3 there exists a SVD decomposition of [u"'^, . . . , u*] — UgSV* and 
we can identify af as uiV^i. The orth onormal property ofY associated to the decrease ofai 
with increasing I leads to have af bounded by \ai\: VZ G {1, ...jq}, \af\ < \(Ji\. 
Taking the pure linear convergence ofu^ in the matrix form, and applying Ug leads to: 

u*(u''' - u°°) = u;pUqU;(?i'=-i - u°°) (3.4) 
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where ijj°)i<j<q represents the projection ofu°° on the span {C/i, . • . , Uq\. [] 

We can then derive Algorithm|4] This algorithm is similar to Algorithm|2]since the error 
transfer operator is defined using the errors of the linear iterative process in a space arising 
from the Singular Values Decomposition of g + 2 successive iterates. Therefore the third step 
of Algorithm|2]is equivalent to the sixth step of Algorithm^ 



Algorithm 4 Vectorial Aitken's acceleration in the SVD space with inversion 
Require: Q : M" — > M" an iterative method having a pure linear convergence 
Require: {u^)i<k<q+2, q + 2 successive iterates of Q to solve the linear system Au = f, 
starting from an arbitrary initial guess u'^ 
1: Form the SVD decomposition of F = . . . , -ui] = VSV'^ 

2: Set I the index such that I — maxi<i<,„+i {S{i, i) > tol}, {ex/.tol = 10^^^.} 

3: Set Yl:i,l:i+2 = Sl:l^l:lVli^g_i.g^2 

4: Set El;lA:l+l — yi:l,2:l+2 — + l 

5: if i?!:;^!:; is non singular then 

1: y^i 1 = (/; - {Yi:u+i - PYi-.i.i) {Aitken Formula} 

9: end if 

Proposition 3.3. Successive applications of Algorithm^converge to the limit u°°. 
Proof Ai the sequence of vector u'^ converges to a limit u°° then we can write 

E = [u\...,u'J] = [u°°,...,u°°]+E 

where E is a nxq matrix with decreasing coefficients with respect to the columns. The SVD of 
S°° = [u°° , . . . , leads to have = and (Ti(S°°) = 0, i > 2. The fan inequalities 
lead to have a i{E) < (Ji{E) = ||£'||2,i > 2. Consequently, successive applications of 
Algorithm^decrease the number of non zero singular values at each application. [] 

In Algorithm |4] the building of P needs the inversion of the matrix Ei-i.id which can 
contain very small singular values even if we selected those greater than a certain tolerance. 
This singular value can deteriorate the ability of P to accelerate the convergence. If it is the 
case, we can proceed inverting this matrix with its SVD, replacing by zeros the singular values 
less than a tolerance instead of inverting them (see numerical recipes |20|). A more robust 
algorithm can be obtained without inverting Eia.id- It consists in building P by applying 
the iterative method Q to the selected columns of Vg that appears in Algorithm |4] Then 
^ = ^i-n i z^(Ui:n,i:;) as donc in Algorithm|5] 

4. Convergence of ARAS and ARAS2 and their approximated form. As an enhance- 
ment of the RAS preconditioning technique, ARAS and ARAS2 should have a better conver- 
gence rate than the RAS technique. We formulate the convergence rate of a RAS technique 
considering the linear convergence of the Restricted Additive Schwarz method and extend 
this formulation to the Aitken's technique. Then we propose a relation between the spectral 
radius of those methods. 

In the following we note 

= (/ - M-M) (4.1) 
Any Richardson's process can be written as 



u'' ^T^u'' ^ + c, where c G M"is constant 
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(4.2) 



Algorithm 5 Vectorial Aitken acceleration in the SVD space without inversion 
Require: Q : M" — > M" an iterative method having a pure linear convergence 
Require: {u'^)i<k<q+2, q + 2 successive iterates of Q to solve the linear system Au — f, 
starting from an arbitrary initial guess u'^ 
1: Form the SVD decomposition of F = ...,«!] = VSV'^ 

2: Set the index I such that I = maxi<i<g_|_i {S{i, i) > tol}, {ex.:tol — 10~^^.} 
3: Apply one iterate of Q on homogeneous problem, with I + 2 initial guesses U-.^i-.i 

4: SetP = U*i^;M/:,l:i 

5: Set Yl:;, 1:2 = Sl.,l^l:lVli^g_^_^.^^2 

6: 1 = (/; - P)-^ {Yi..i,2 - P Yij^i) {Aitken Formula} 

7: =U:,1:, y^t. 



Remark 3. As the ARAS2 iterative process correspond to 2 iterations of the ARAS process, 
we notice that TaraS2 = T'l^yis- 

4.1. Ideal case. When building the ARAS preconditioner, we exhibit the fact that TarasW 
is nilpotent when the error's transfer operator on the interface F, if P is considered exact. This 
property gives the following proposition: 

Proposition A.l.IfP is known exactly then, 

P {Taras2) = P (Taras) = (4.3) 

Proof If P is known exactly then Proposition 12.71 is verified and Taras ond TaraS2 <^re 
nilpotent. The spectral radius of a nilpotent matrix is equal to 0. [] 
Remark 4. Obviously, p{Taras2) = P {Taras) < p{Tras) 

But the matrix P is often numerically computed and then p {Taras) is no longer equal 
to 0. The value of p {Taras) depends on the accuracy of the local domain solutions and 
when P is written in another space, depends on the quality of this space. In the following we 
propose a framework to study the convergence of TARAS(q) TARAS2{q)- The goal is to 
provide key elements to understand the influence of approximating P in an orthogonal base 
on the preconditioner 

4.2. Convergence of RAS for an elliptic operator. In this subsection we express the 
convergence rate of a RAS iterative process considering its convergence on the artificial in- 
terfaces in proposition l4.2l Since we can link the convergence of RAS on the entire domain 
to the convergence on the interface, it becomes possible to study the effect of modifying the 
error's transfer operator P. 

Proposition 4.2. Let Abe a discretized operator of an elliptic problem on a domain VL. 
Let us consider a RAS iterative process such as Tras — I ~ -^ras^ defined on p domains. 
The data dependencies between domains is located on an artificial interface T. Then there 
exists an error's transfer operator on the interface V, P such as there exists a norm \ \.\\ for 
which \ \P\ \ < 1. The convergence rate of Tras is 

p(rfl,As)-max{|A|.- AgA(P)} (4.4) 

Proof In the case of elliptic problem the maximum principle is observed. Then, for the 
Schwarz method the error is maximal on artificial interfaces. We write the error of a Schwarz 
process starting for the definition of the RAS iterative method: 

u''+' = Trasu'' + Mj^Xgb (4.5) 
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The convergence of such a process is given by ^7]/ 4721/ ; 



= r^Ase° (4.6) 



On the interface, one can write: 



ef=p = P'^ej'r (4.7) 
r/ie error is maximal on the interface thus, 

lle'^IU = llefrlloo (4.8) 
Equations Ii4.6[ . ( I4.7I > and ( 14.81 ) /eoii fo 

l|rW°lloo = ll^'^ej'rlloo < ||P*=||oc||e|'rl|oo (4.9) 

Then we can write, 

sup (l|T^Ase°lloo) = sup (llP'^efplu) (4.10) 

l|e°||=e = l l|e|'rll~=l 

^ll-P^-'IU (4.11) 



Hence, 



lim IIP'^II^ =p(rjiAs) (4.12) 

fc— >oo 



4.3. Convergence of ARAS and ARAS2 in their approximated form. As we men- 
tioned previously, there exist different approaches to approximate the error transfer operator 
P. For a fully algebraic approach, one will choose the approximation of the operator in a 
basis built explicitly as we described in[3] When it is possible, one can build a complete base 
analytically and make an approximation of the operator in this base by truncation. In the 
following, we choose the analytical approach to study the convergence of the method. 

Here we focus on elliptic and separable operators. We propose a theorem giving the 
convergence rate of an ARAS iterative process when the error's transfer operator can be 
exactly computed in a space spanned by the eigenvectors of P and then truncated to provide 
an approximation of the error transfer operator in the physical space. 

Theorem 4.3. Let Abe a discretized operator of an elliptic problem on a domain f2. 
Let us consider a RAS iterative process such as Tjjas ^ I ^ -^^ras^ defined on p domains. 
Let the error transfer operator P on an interface T be diagonalisable. If P is diagonalisable, 

its decomposition in eigenvalues leads to have P = UPU~^ where for i G [liJ^l, P = 
diag(\i). The error on the interface V in the approximation space follows e^^^ — P<i\Y- 
Each mode converges linearly and independently from the others following e^-^l ~ ^i^\r i- 
Let Q\ G R"^" be a diagonal matrix such that qi^lifl<l<q and qi ~ if q < I. 
And let Q\ = /„ — Qx. A coarse approximation of P can be done choosing a set of q strong 
modes as P ^ Q\P- Writing the preconditioner as: 

MaraS(,),5 = + i??U -Py'- Ir?j V'' R^j (4.13) 
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The spectral radius ofTARAS{q) ■ 

p{TARASig)) = PiQ>^P) = Vi < min{|A| : A e \{QxP)} (4-14) 

Proof We consider the assumptions of the theorem and the formula of the approximated pre- 
conditioner \4.13\ The equation (12.21 1 can be written for ARAS(q) such as: 

+Rl {In -Py^ (ufr - Pn\y^) Rr 
-RllnRr {Trasu''-^ + Mj^Xsb) 
We consider that on the interface: 

{Trasu''-' + Mj,\sb)\r = P'^tr' + ^ (4.15) 

With c e M", a constant vector independent of u\r- 

Extracting the interface's solution in the approximation space, 

^fr = P^fjr^ + £ + [in - P) ' (^fr - ^^|r') - QxP^T^ - Q\c 

Then, 

ufr = Oa^^IT' + Qa^^ + Qac (4.16) 
The error on the interface is 

- #r = ^1? - QxP^T^ - Qa^^ - Qx£ 

Thus, 

l\^^Q^{iL'^-h\^^ -t) (4.17) 
Regarding equation ( 14.151 1, Puj^jT^ + c = uj^p, and then, 

{l'^ - P^^^ -h = ii't = Pe-T (4-18) 

Hence we write, 

= Qa^'IIT' (4-19) 

We showed that the ARAS iterative process has an error's transfer operator, Q\P, equal 
to the part of the error's transfer operator P that we did not compute. We note that \ \Q\P\ \ < 

\\h<i- 

A is a discretized operator of an elliptic problem, then we can apply ProDosition \4.2\ and 
write: 

piTARASig))^ PiQxP) (4.20) 

Then we proof eauation \4~T4\ [] 

Remark 5. For a separable operator, the error transfer operator on an interface between 
two domains is diagonalisable 1241 . Then, the error transfer operator P on an interface T 
is diagonalisable for a global operator which is separable and for which the interfaces of all 
the domains are parallel to one discretization direction. 
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4.4. Convergence study in the case of the 2D Poisson's equation. We consider a 
simple case of an elliptic problem on a rectangle defined in equation (14.21b . The problem is 
discretized by 2D finite differences and decomposed into 2 domains ili and ^2 of the same 



size. 



~Au ^ f, mn = [0, 1] X [0, tt] 
= 0, on dfl 



(4.21) 



y.. 



Figure 4. 1 . 2D domain decomposition for Poisson's equation 



In the Fourier base, for a separable operator with 2 artificial interfaces, the acceleration 

Pr 



is written with P 



( - 



such as 







4-' 



(4.22) 





We study the case of a regular grid and write the semi-discretized 2D Poisson's operator 
on [0,ri] X [0,7r] U [r2, 1] X [0,7r], Ti > We consider A'^j; + 1 in the direction x. The 
overlap in terms of number of step size in direction x is denoted by S. And we denote by A; 
the eigenvalues of the discretized operator — considering a second order finite difference 
scheme. We find the coefficient of the matrices Pr- solving the equations: 



hi 



+ A/u2 



0,^ e II, iV:. 


1 

0,^e [l,iV, 
1 




-11 



-11 



The roots of those equations are such as. 



2 + Xihl + ^Xfh^^+iXihl 



r2 = 



2 + Xihl - ^Xfht+AXihl 



(4.23) 
(4.24) 

(4.25) 
(4.26) 
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Then for the first domain fli the solutions have the form: 



Thus, 



r, — 



(4.27) 



(4.28) 



appears to be diagonal and its diagonal coefficients (5; can be analytically derived 

such as: 



(4.29) 
(4.30) 



Exactly the same development can be done to find (5;,r2- 

Remark 6. If the two domains have the same size, then 5i^Tx — ^i,T2 

The P matrix has the form: 



P 



( 



^1, 



V 



, with 1 > (5i,r, > ... > <^„r,- (4.31) 



/ 



For the sake of simplicity we consider that the domain fii and ^2 have the same size. 
We note 81 = (5; Ti = Si^r2- Then we can calculate the determinant of {P — A/„): 



det{P - XI,,) ^Y[{Si - X){Si + X) 



1=1 



Hence the spectrum is {Si, (5„, Si, — 5„}. 

We showed that the error's transfer operator is diagonaUsable for this problem. P can be 
written in a base U of eigenvectors as follows: 



P = UPU"\ with P = diag{Si, Sn, Si, S„) 



(4.32) 



Then we can estimate the convergence rate of the RAS, ARAS(q) and ARAS2(q) apply- 
ing Theorem l4.3l 



p{Tras) = 61 

P{TARAS(q)) = 
p{TAB.AS2(q)) = ^l+l 



(4.33) 
(4.34) 
(4.35) 



Because the eigenvalues and the values of Pr^ are equal we can verify a correspondence 
between the approximation by truncation in the eigenvectors space and the Fourier space. 
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Selecting the first Fourier mode corresponds to selecting the highest eigenvalues. Let us 
introduce the transfer matrix Cp; from the real space to the Fourier space and the transfer 
matrix Dr^ from the Fourier space to the real space: 



Cr. : M" — > C" and L»r. : C" — > M" 
e|r, I — > e\r, e|r, < — > eir^ 

Then we write 



The approximation is done by applying the operator 



gr„^ 

where Qr^.j^ — diag{qi), qi ~ 1 if 1 < I < q and qi = Q if q < I. Then we write the 
preconditioner 

Marasm = + - ^^^)"' - In)CRr)M]^Xs (4-37) 

As previously we introduce a matrix Qjr = (/ — Qjr). 

We can then follow the demonstration done in the proof of theorem |43] writing the error 
on the interface in the Fourier space as 

- = u^ ~ QrPu\r^ - Qtu^ ~Qrc (4.38) 

Thus, 

efr = (wj? - Pu\-^ - c) (4.39) 



Regarding equation (14.151 1. -Pufp ^ + c = {tfp, and then. 



?^efr (4.40) 



As efr- = ^efv we write 



efp = Q^Pe\-^ (4.41) 

We showed that the ARAS iterative process has an error transfer operator, Qj^P, equal 
to the part of the error transfer operator P that we did not compute. We note that | IQa^*! | < 
II^'II<1- 

We can apply Proposition 14 . 2 1 and write: 

p(TARAS(q)) ^ piQ^P) (4.42) 
The conclusion becomes the same as applying Theorem l4.3l 

We pointed out here the way the approximation of the error's transfer operator affects 
the convergence of Schwarz iterative processes in the case of a separable operator for a two 
domain decomposition. This enables us to understand the philosophy of approximating the 
matrix P in different spaces and links the works done in 
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5. Results on academic problems. 

5.1. 2D theoretical study. The goal of this section is to validate the ARAS method 
on a simple case where the ARAS preconditioner can be written analytically and where we 
can apply Theorem 14.31 We consider the 2D problem decomposed in 2 domains presented 
in subsection 14.41 The grid size is about 32 x 32. This subsection provides the theoretical 
framework we implement in Matlab. Here, we verify numerically the theoretical results given 
previously. 

We build the matrix P G C'^°^^°. Only the internal points are taken, leading to 30 
modes which can be accelerated. Those modes decrease from 0.8106 to 0.1531. We decide 
to compute the entire P and a truncated one of size q = 15, Qj^P- Figure ISTI shows the 
coefficient of the matrix computed. The goal here is to retrieve the convergence rate given by 
the application of theorem 143] in equation ( 14.33b . The convergence rate of a ARAS(q) type 
preconditioner is related to the coefficients of P denoted by Si. 



1 
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Figure 5.1. Diagonal coefficients of P and Qj^P corresponding to the modes to accelerate. 



For q~ 15 we note that. 



5i = 0.8106 
5q+i = 0.2535 
(5^^i 0.0643 



(5.1) 
(5.2) 
(5.3) 



Then we compute the RAS, ARAS(q) and ARAS2(q) preconditioners and compute for 
each preconditioner of type * the spectral radius of T* and the conditioning number in the 2 
norm of M^^A. The convergence for each appropriate stopping criteria is 10^^". 

Table 15. II shows the numerical convergence rate of RAS, ARAS(15), ARAS2(15), and 
ARAS2(30) for the test case presented in Subsection 14.41 The numerical values obtained for 
pTf, match perfectly the theoretical results. It exhibits also that the Aitken acceleration of 
the RAS enhances the condition number of the preconditioned problem. In accordance with 



the theory, when q 
numerically. 



30, the size of the artificial interface, P is exact and M 



ARAS2 



A- 
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prec. * 


p{T.) 


k(M-^A) 


It. Rich. 


It. GCR 


RAS 


0.8106 


30.0083 


96 


18 


ARAS(q=15) 


0.2535 


5.2358 


14 


7 


ARAS2(q=15) 


0.0643 


1.1451 


7 


5 


ARAS2(q=30) 


1.4319 e-13 


1.0000 


1 


1 



Table 5.1 

Numerical performance of RAS, ARAS and ARAS2 on the 2D Poisson problem. 



Moreover the number of iterations of the iterative process ARAS(q=15) is twice the 
number of iterations of the iterative process ARAS2(q=15). 

5.2. Observing the influence of the partitioning and the approximation space on a 
2D Helmholtz problem. Here, we focus on the influence of the partitioning chosen to set 
up the domain decomposition method. We also focus on the influence of the choice of a base 
to approximate the Aitken's acceleration. When the mesh is known it is possible to partition 
the operator following a geometric partitioning. One point is to see what can happen if we 
partition the operator with a graph partitioning approach such as METIS. Another point is 
to see how the choice of a base influence the performance and the cost of the ARAS type 
preconditioner 

Let us consider the 2D Helmholtz problem (— cj — A)m — f mQ. — [{), 1]^, u — Con 51). 
The problem is discretized by second order finite differences with m points in each direction 
X and y giving a space step h — j^^z^- The set value a; = 0.98-p-(l — cos{-Kh)) is close to the 
minimum eigenvalue of the discrete — A operator in order to have an ill-conditioned discrete 
problem with Koo(^) = 1-7918 £; + 07 for m = 164. 




Figure 5.2. Partitioning into 8 parts on a 2D Helmholtz problem of size 164 X 164, (left) Physical Band 
partitioning, ( right) METIS partitioning. 

First we solve the problem with a physical band partitioning, and then we solve it with 
a METIS partitioning using eight sub-domains. Figure \52\ illustrates the physical band and 
the METIS partitioning using eight sub-domains. In the physical partitioning, borders are 
smooth, contrary to the METIS partitioning which creates corners and irregular borders. The 
corners give cross points which deteriorate the convergence of the Schwarz method. 

For each partitioning, we build the ARAS2 preconditioner in two different bases: 

• an orthogonal base arising from the application of the preconditioner RAS on a 
sequence of random vectors (see subsection l3.1b . 

• a base built from the successive Schwarz solution on the interface and passed in its 
SVD base (see subsection l3.2l l. 
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Remark 7. In the following, we denote by ARAS2(r=^) the preconditioner approximated 
in the "random" base. Because the number of vectors can be high for this kind of base, we 
choose to express the reduction number r in parentheses instead of q, the number of column 
ofVq, but the formula is still: 

Figure l53] (respectivelv Figure lSAt presents the Richardson process with ARAS2 and the 
ARAS2 preconditioned GCR Krylov method for the physical band partitioning ( respectively 
a METIS partitioning) using a random base or a SVD base. These results were obtained 
with a sequential Matlab code able to run small academic problems. The Krylov method 
used is the gradient conjugate residual GCR while the LU factorisation is used to solve the 
sub-domain's problems. 

These results exhibit: 

• Richardson processes in Figure ISJj converge while in Figure l54l onlv the RAS itera- 
tive process converges and the ARAS2 process diverges. Consequently the physical 
partitioning enables good convergence of the RAS which can be accelerated with 
the approximation of P by Pu^ . The METIS partitioning slows the convergence of 
the RAS due to the cross points. Then, using a domain decomposition method as a 
solver with an algebraic partitioning can produce bad results when it is accelerated 
by Aitken. Let us notice that the full P makes the RAS process converge in one 
iteration. 

• Nevertheless, the Aitken-RAS used as a preconditioner is very efficient even on 
the METIS partitioning with cross points where the Aitken-RAS as a Richardson 
process diverges. This makes the Aitken-RAS a robust algebraic preconditioner. We 
must notice that the effect of the preconditioning with a METIS partitioning is less 
efficient than the one with the physical partitioning. 

• The better the base Vq is able to represent the interface solution, the better the pre- 
conditioner is for the random base and the SVD base. 

Let us observe the difference between the two choices of base to compute the accelera- 
tion. On the one hand, the choice of an orthogonal base arising from the application of the 
RAS preconditioner on a sequence of random vectors presents good advantages for a precon- 
ditioner. With this approach, the preconditioner can be used for different right-hand sides. 
However, the number of vectors necessary to describe the interface can be close to the size of 
the global interface, increasing the cost of the preconditioner. On the other hand, it is possi- 
ble to build the acceleration for many iterations of the Additive Schwarz process, computing 
the SVD of the interface solutions. Then the acceleration process is problem-dependent, but 
experience shows that a small number of iterations can enable a good approximation of Py^ . 

For a physical partitioning, we can evaluate the cost of each preconditioner For each sub- 
domain, the artificial interface is of size 164 for the uppermost or lowermost sub-domain, and 
164 * 2 = 328 for internal sub-domains. Hence, for p — 8 the global interface has a size 
of 164 * (6 * 2 + 2) = 2296. For r — 1 the base is complete and Py is exact. There is 
no need to use ARAS as a preconditioning technique. For all r the size of F is 2226 fjjgjj 
the number of M^\gX = y products to build Ai^j^^^ is 3 * While the number of 

M^\gX — y products to build Af^^^g for the base arising from SVD only depends on the 
number of Richardson iterations. 

Figure 15731 shows that for r — 8, n — 185 and the number of products M^^gX — y is 555, 
the convergence of GCR is reached in 11 iterations. For 24 iterations of Schwarz, we build a 
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Figure 5.3. Solving 2D Helmhohz equation on a 164 X 164 Cartesian grid, physical band partitioning, p = 
8,(left) ARAS2(r = ^ ) is built with a Random base, (right) ARAS2(q) is built with a SVD base, (top) Convergence 
of Iterative Schwarz Process, (bottom) convergence of GCR method preconditioned by RAS and ARAS2. 



matrix Ptj of size 24, which is around eight times smaller than with the previous base. The 
number of matrix products is 48, 12 times smaller than with the previous base. The number of 
GCR iterations is 15. Eventually, the cost of a good independent preconditioner is excessive 
compared to the one with the SVD. 

Figure 15.51 focuses on the eigenvalue of the error transfer operators when the base is 
computed from SVD and both partitioning. We compute all the singular values corresponding 

24 




Figure 5.4. Solving 2D Helmholtz equation on a 161 X 164 Cartesian grid, METIS partitioning, p = 8,(left) 
ARAS2 is built with a Random base, (right) ARAS2 is built with a SVD base, (top) Convergence of Iterative Schwarz 
Process, (bottom) convergence ofGCR method preconditioned by RAS andARAS2. 



to the number of interface points and select 24 singular values from this set of n values. For 
8 partitions with a manual partitioning, n — 2296 and with a METIS partitioning we obtain 
1295 interfaces points. We saw that the Aitken-RAS technique used as a Richardson iterative 
process diverges for the METIS partitioning. We study the spectrum of a preconditioner 
in the two cases and compare it to the spectrum of the RAS preconditioning method. For 
convenience we consider only a set of the 40 largest eigenvalues. 

The predicted values of the spectrum of the error transfer operator on the interface gives 
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Figure 5.5. Eigenvalues ofTARAs(<l = 24) compared to eigenvalues of Pjj^ for a 164 X 164 Cartesian 
grid, p = 8,(top) Band partitioning, ARAS is built with a SVD base computed with 24 singular vectors chosen over 
2296, (bottom) METIS partitioning, ARAS is built with a SVD base computed with 24 singular vectors chosen over 
1295. 



a good approximation of the spectram of the iterative process in the case of the band parti- 
tioning. Otherwise, for the METIS partitioning, it appears that the spectrum gives a good idea 
of the spectrum of the iterative method but differs for the first eigenvalues. Then it appears 
that the first eigenvalue of Taras is greather than 1 instead of the value computed for QP, 
which is less than 1. It should explain why the iterative process diverges. 

This empirical analysis shows the influence of a partitioning technique on the Schwarz 
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preconditioner. It is important to know that if the user has the entire knowledge of the lin- 
ear system he solves then he should provide a physical partitioning which can have smooth 
boundaries. Otherwise, if the sub-problems are non-singular then the Schwarz method used 
as a preconditioning technique is efficient but can present a lack of speed in the convergence. 
The second point is the choice of a good base. The two bases are efficient, but the one arising 
from the S VD presents the best choice for time computing considerations. 

6. Results on industrial problems. In section|5]we pointed out that a graph partitioning 
to define the domain decomposition and the algorithm of approximation using the S VD of the 
Schwarz solutions should be a good choice to algebraically build the ARAS preconditioner. 
We first propose an estimation of the computing cost and then apply the preconditioner to an 
industrial problem. 

6.1. Computing cost modelling. We want to evaluate the cost of building and applying 
an ARAS type preconditioner in terms of arithmetic complexity. We denote by AC{*) the 
arithmetic complexity of an operation *. Let considers a matrix A G This matrix is 

decomposed into p sub-matrices Ai G M."^'^"^^ . The decomposition leads to have an interface 
r of size n. The coarse interface is of size q. We denote by Xa G M" where a G N should be 
any of m, rrii, n or q. 

6.1.1. Arithmetic complexity of applying an ARAS type preconditioner. Let consid- 
ers the operation: 

p 

1=1 

The cost of such an operation mostly consists in the p operations A^^x„n which depends 
on the cost of a chosen method to inverse Ai such as a Krylov methods or a LU factorization 
or maybe an incomplete LU factorization. 

Then the cost of applying a RAS preconditioner is written as: 

{Mj^\gXrn) ^pxAC (Ar^xm,) (6.1) 
We now consider the operation: 

p 

i=l 

The cost of such an operation consists in one application of a RAS preconditioner, the 
base transfer operated by U,, and solving (Jg — Pu^ ) ?/g — Xq. Others or summing operations: 
1 sum between 2 vectors of size n and, one subtract between two vector of size q. 

Then the cost of applying a RAS preconditioner is written as: 

-^C {M^RAs^^) = P X {A-^x,n,) + AC (U^x™) 

+AC ((/, - PvX^ xq^ + AC (Uqx,) 
~t~^^C '^ij) ^AC {^x^ ~\~ Xji^ 

We note that: 

• An addition between 2 vectors of size n consists in n operations. 

• A subtraction between 2 vectors of size n consists in n operations. 
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• A scalar product between 2 vectors of size n consists in n product and n sum. 

• A multiplication between a matrix with n lines and m columns and a vector with m 
lines consists in n scalar products of vectors of size m. 

Then we write, 

AC (Xf, - x,i) = q 

AC (VqXq) = m X 2 X q 
AC (VgXm) = qx2x m 



And, 

(MXRAS^m) = PXAC (A-^Xra^ ) (6.2) 

+^c((/,-PuJ"'xq) (6.3) 
+4:Xqxm + n + q (6.4) 

Remark 8. In most cases the coarsening is such that q <^m. Then the cost 

AC (^{Ig - Pv,y' Xg) 

should be very small compared to the cost p x AC (A^ ^Xmi)- If it is the case 

(m^Aas(,)^-) = (M^Xs^m) + 0{m) (6.5) 

This means that the cost of one application of the ARAS preconditioner is close to the 
cost of one application of the RAS preconditioner when q <^ m. 

We now estimate the cost of applying an ARAS2 preconditioner: 

^ARAS2{q)^m = '^^ARASiq) ~ ^ARAS(q)^^ARAS{q)^"^ (^-^^ 

It consists in 2 applications of ARAS(q) and one matrix vector product on the entire 
domain. We note that the matrix A is sparse. So, denoting by nnz{A) the number of non 
zeros elements of A we write, 

AC{Axm.) = 2 X nnz{A) 

Hence, 

{Mxl,AS2iqfm) =2xAC (M-i^g(^)X„) + 2 X nnz{A) + 20{m) (6.7) 
Remark 9. For q <^m, 

AC (M^^^_52(g)2^™) = 2 X (M^AS^m) + 0{nnz{A)) + 0{m) 



28 



6.1.2. Arithmetic complexity of building the coarse space and Pu,- We focus 
here on the cost to build a base arising from the SVD of the Schwarz solutions on the interface. 
We refer to Algorithm|5]which proposes a robust way to implement the Aitken's acceleration 
without inversion. 

We compute q + 2 iterations of a RAS iterative process. It consists in applying the 
preconditioner on a vector Xm and summing the result with another vector of the same size. 
We can write 

AC{TRASXm) = AC{M^\sX,n) + 0{m) (6.8) 

Then we perform a SVD over a set of g + 2 vectors of size n, Xqj^2 G ]R"^('J+-^). 

^C(building U,) < (g + 2) X AC{TRAsXm) + AC{SV D{Xq+2)) (6.9) 

After this, we apply one iteration of the Schwarz iterative process on at most the q first 
left singular vectors to build Pu,, . Thus, 

ylC(building and PuJ <{q + 2) xAC{TRAsXm) 

+AC{SVD{Xq+2) + qx ACiMj^XsX„,) (6.10) 

Hence, 

y^C (building U, andPuJ < 2{q+l)xAC{M]^\sXm)+AC{SVD{Xg+2))+0{m) (6.11) 

The cost of building the coarse space and the error transfer operator depends on the num- 
ber q of vectors needed. Then the ARAS(q) preconditioner will be a good choice compared 
to RAS if q is sufficiently small compared to the number of application of the preconditioner 
involved in the Krylov iterative method. 

Remark 10. This computation, following the robust algorithm |5] is nearly two times 
costly than Algorithm^with inversion. In fact, the building of Pv^ with Algorithm^consists 
in inverting an error matrix of size q and multiplying it, on the left, by another matrix of size 
q. For simplicity we consider those operations of order 0{m). 

AC{buildingl]qandPvJ < {q + 2) x AC{M^\gXm) + AC{SVD{Xq+2)) + 0{m) (6.12) 

In order to save computing, Algorithm^with inversion is the best choice. 

6.1.3. ParalleUzation. It is important to note that the Restricted Additive Schwarz pro- 
cess is fully parallel, in the sense that the inverse of Ai can be computed independently by 
every single process i handling a domain i. Then for p processes, we can re-write the formula 
(lO : 

■^C (Mflls^™) = (Ar^x^^) (6.13) 

The parallelism leads to have a reduction of the matrix vector product. Then we write, 
for p processes: 

AC (VqXq) = nil X 2 X q 
AC (U^x,„) = q X 2 X nil 
AC{Ax,n) = 2 X nnzi{A) 

Table l6.11 shows the parallel arithmetic complexity in 0{mi) for building and applying a 
parallel ARAS(q) preconditioner 
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Task 


AC{SVD{Xq+2)) 




Building 


1 


q + 2 


Apply 





1 



Table 6.1 

Parallel complexity for building and applying an ARAS(q) preconditioner 



6.2. Application on a 31? CFD industrial case. We consider CASE_017 RM07 avail- 
able in the sparse matrix collection [13], which represents a 3D viscous case with a "frozen" 
turbulence. Here, the geometry is a jet engine compressor. The problem is discretized among 
54527 nodes. Seven variables per node are considered. The resulting matrix is of size 381689 
with 37464962 non-zeros. The matrix is not symmetric. 

We use a PETSc-MPI implementation with a PARMETIS partitioning taking into ac- 
count the block structure and the weight of each block. The matrix is partitioned in four 
parts, with a minimum overlap. We apply the ARAS and ARAS2 left-preconditioners with 
40 basis vectors computed following Algorithm|5] 

Figure 16.11 shows the convergence of the preconditioned GMRES and the convergence 
of the Richardson processes associated to the preconditioners. In order to disminish the cost 
of building phase, the tolerance is set to 10^® for the local solution, but only for the building 
phase.Then, the iterations to build Py^ take less cpu-time and memory allocation than one 
application of the preconditioner during the solution phase. The ARAS(q) preconditioner has 
been presented as a left preconditioner The stopping criteria used for the GMRES method 
implemented in PETSc is based on the relative residual. This is why the curves are not going 
to the same tolerance. We discuss this point with the following results. For a minimal overlap, 
without knowledge of the underlying equations and mesh design, the ARAS preconditioner 
is efficient, and also its mutliplicative version, ARAS2. 

Table \62\ shows the performance results corresponding to Figure [64] on an SGI Altix 
Xe340. While applying ARAS(40) or ARAS2(40), both Solution Time and Memory alloca- 
tion are reduced. The time involved in the building phase of the preconditioner depends of 
the choice of local factorization and the kind of algorithm chosen. The different times follows 
the formula of the arithmetic complexity presented in Subsection 16. II The relative residual 
for RAS and ARAS(q) is the same. We explain the difference of convergence between ARAS 
and ARAS2 for this case by the fact that the Richardson processes diverge and leads to an 
amplified inaccurary in ARAS2. 



Prec. 


Building 


Solution 


Max. Loc. Mem 


\\Ax-b\U/\\b\\, 




Time (s.) 


Time (s.) 


(M.O.) 




RAS 


8.684 


1552.89 


1068 


1.5704 e-09 


ARAS (40) 


429.943 


1086.63 


1048 


9.7492e-10 


ARAS2 (40) 


446.454 


1174.97 


1010 


2.2760e-07 



Table 6.2 

Solving 3D Navier Stokes equation (CASE RM07), PARMETIS partitioning with weights, p = 4, overlap 0, 
ARAS2 is built with a SVD basis, GMRES method preconditioned by RAS and ARAS (buih with tol = 10~^j. 



7. Conclusion. We presented an integration of the Aitken acceleration technique in the 
RAS preconditioning. This integration leads to a multi-level preconditioner One level is the 
entire domain, while the second is the entire artificial interface. Since the computation of the 
error transfer operator is costly, we propose an algebraic computation of a coarse space, built 
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Figure 6.1. Solving 3D Navier Stokes equation ( CASE RM07), PASMETIS partitioning with weights, p = 4, 
overlap 0, ARAS2 is built with a SVD basis, (left) Convergence of Iterative Schwarz Process, (right) convergence of 
GMRES method preconditioned by RAS and ARAS (built with tol = 10~^). 

from the SVD decomposition of Schwarz solutions on the interface. The results is a cheap 
and fully algebraic enhancement of the RAS preconditioner. An analysis of the convergence 
of this preconditioner is given when the basis is built analytically and shows the effect of 
the preconditioner depending on the choice of the mode to be accelerated. Finally, a result 
is provided on a 3D industrial case without knowledge of the underlying equations and the 
mesh design. 

Future work should focus on other algorithms to build algebraically the Aitken acceleration 

in order to reduce the time spent in the building time. Another issue concerns work on parti- 
tioning techniques for domain decomposition. Actually, there is no technique to provide local 
system with insurance of inversion. 
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